-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigQuery: Add load_table_from_json() method to BQ client #9070
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay!
A few nitpicky things.
|
||
if job_config is None: | ||
job_config = job.LoadJobConfig() | ||
job_config.source_format = job.SourceFormat.NEWLINE_DELIMITED_JSON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having second thoughts about modifying the input config object. See:
google-cloud-python/bigquery/google/cloud/bigquery/client.py
Lines 1523 to 1529 in fcf99ce
if job_config is None: | |
job_config = job.LoadJobConfig() | |
else: | |
# Make a copy so that the job config isn't modified in-place. | |
job_config_properties = copy.deepcopy(job_config._properties) | |
job_config = job.LoadJobConfig() | |
job_config._properties = job_config_properties |
We might want to refactor that code into a helper function for creating or copying a LoadJobConfig
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if there isn't a schema set, we should enable schema autodetect for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for not simply deep-copying the entire LoadJobConfig
instance instead of separately deep-copying the _properties
attribute?
The only other private attribute seems to be the _job_type
string, and __init__()
(that would not get called) does nothing apart from setting some attributes through setters, which more or less manipulate the same _properties
dict behind the scenes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One potential reason is that _properties
can be used to send values to the API for Alpha releases that aren't yet supported in client libraries. Not sure if deep copy of LoadJobConfig
would carry those over.
_properties
is the real core of the object, since that's what defines what we end up actually including in the API request.
Pushed new commits, but GitHub does not detect them. Also tried rebasing the branch, still no luck. Maybe closing and re-opening will work. Update: The same happened on the replacement PR, but then the new changes to the latter showed up after a couple of minutes. Seems like a temporary GitHub service glitch. |
Closes #4553.
This PR adds a helper client method
load_table_from_json()
.How to test
Check that the method is similar to e.g.
client.load_table_from_dataframe()
and in line with the requirements.P.S.: I will address the last issue comment separately once this is approved (replacing the
_add_rows()
helper in system tests is a separate refactoring task IMO).